-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add DangerousWorkflow check for imposter commits. #2789
Conversation
c505c5d
to
dda3d4d
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #2789 +/- ##
==========================================
- Coverage 49.21% 49.15% -0.06%
==========================================
Files 158 158
Lines 11967 12129 +162
==========================================
+ Hits 5889 5962 +73
- Misses 5709 5791 +82
- Partials 369 376 +7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a complete review, but wanted to get some comments out that address some big blockers with this current implementation. This change will likely require converting Dangerous-Workflow
away from a file content only check:
scorecard/checks/dangerous_workflow.go
Lines 28 to 32 in eb57e04
func init() { | |
supportedRequestTypes := []checker.RequestType{ | |
checker.FileBased, | |
checker.CommitBased, | |
} |
I'll also post some thoughts in the main issue for this PR that are better suited there.
clients/githubrepo/client.go
Outdated
newClient := &Client{ | ||
ctx: client.ctx, | ||
repoClient: client.repoClient, | ||
graphClient: client.graphClient, | ||
contributors: client.contributors, | ||
branches: client.branches, | ||
releases: client.releases, | ||
workflows: client.workflows, | ||
checkruns: client.checkruns, | ||
statuses: client.statuses, | ||
search: client.search, | ||
searchCommits: client.searchCommits, | ||
webhook: client.webhook, | ||
languages: client.languages, | ||
licenses: client.licenses, | ||
tarball: client.tarball, | ||
} | ||
repo, err := MakeGithubRepo(inputRepo) | ||
if err != nil { | ||
return nil, err | ||
} | ||
if err := newClient.InitRepo(repo, commitSHA, commitDepth); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have concerns about what this InitRepo
call will do to the state of the parent client since all of the struct fields are pointers.
For example, InitRepo
will call client.licenses.init(client.ctx, client.repourl)
which will wipe any data seen already, reset some sync primitives, etc.
func (handler *licensesHandler) init(ctx context.Context, repourl *repoURL) {
handler.ctx = ctx
handler.repourl = repourl
handler.errSetup = nil
handler.once = new(sync.Once)
handler.licenses = nil
}
There are all sorts of nasty race conditions and duplicated work waiting to happen here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored this to use the same handler creation logic as CreateGithubRepoClientWithTransport. PTAL!
I agree with the need for a sub- If we modify RepoClient so the struct fields are optional that could help: e.g.
In the GitHub implementation for example, the |
I think it makes sense to discuss this some more before re-implementing things. |
For the repo client - good to know that the handlers are stateful. I think we could probably get a fresh client if we just refactor some of the client initialization from here - scorecard/clients/githubrepo/client.go Lines 265 to 309 in b6362b1
The main piece we want to reuse is the authed client, which we can pick out from the client struct. I'll take a pass at it. (also open to other ideas) |
See https://www.chainguard.dev/unchained/what-the-fork-imposter-commits-in-github-actions-and-ci-cd This borrows its implementation from https://github.com/chainguard-dev/clank to look up imposter commits for a repo. Signed-off-by: Billy Lynch <billy@chainguard.dev>
This limits the number of calls made instead of probing every branch. Signed-off-by: Billy Lynch <billy@chainguard.dev>
Signed-off-by: Billy Lynch <billy@chainguard.dev>
Signed-off-by: Billy Lynch <billy@chainguard.dev>
678ed46
to
b807b44
Compare
Signed-off-by: Billy Lynch <billy@chainguard.dev>
Looks like CommitBased / FileBased are the only 2 types at the moment. scorecard/checker/check_request.go Lines 40 to 45 in eb57e04
I assume we'll need to make a new one? Any suggestions for what this new type should be? |
I meant more along the lines of removing scorecard/checks/dangerous_workflow.go Lines 28 to 32 in b6362b1
Which is a little unfortunate, as there's still plenty of functionality which works on the file content. Scorecard would benefit from some sort of "optional API call" as once a check removes support for
|
Agree that extracting the auth'd transport is the main bit we care about. I'll take a look at any follow-up commits later. We can always revisit the optional |
Disregard about the tarball handler, it's loaded lazily so shouldn't be a problem |
We need to remove this because we need to make API calls to verify commit reachability for imposter commits. We may want to look into ways to break this up so that the pieces that don't need API access can still run locally. Signed-off-by: Billy Lynch <billy@chainguard.dev>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NewClient approach looks good to me.
I've left some comments on the ref parsing, with regard to re-usable workflows (which should be vulnerable to imposter commits) as well as filtering out non commit SHA refs.
And finally, the increased quota would likely be a problem for running this check in the cron so want to get some input from @azeemshaikh38 on that one.
supportedRequestTypes := []checker.RequestType{ | ||
checker.FileBased, | ||
checker.CommitBased, | ||
} | ||
if err := registerCheck(CheckDangerousWorkflow, DangerousWorkflow, supportedRequestTypes); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check is still checker.CommitBased
in my opinion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
||
// If not, query subrepo for commit reachability. | ||
// Make new client for referenced repo. | ||
subclient, err := c.client.NewClient(repo, "", 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commitSHA probably shouldn't be "" here, perhaps clients.HeadSHA
?
@@ -52,6 +53,7 @@ type RepoClient interface { | |||
ListStatuses(ref string) ([]Status, error) | |||
ListWebhooks() ([]Webhook, error) | |||
ListProgrammingLanguages() ([]Language, error) | |||
ContainsRevision(base, target string) (bool, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if clients are initialized at a given SHA:
e.g. InitRepo and NewClient take in a commitSHA
arg, do we need base
as an arg to ContainsRevision?
for _, repo := range []string{ | ||
"http://github.com/actions/checkout", | ||
"http://github.com/ossf-tests/scorecard-check-dangerous-workflow-e2e", | ||
} { | ||
_, e := git.PlainClone(tmpDir, false, &git.CloneOptions{ | ||
URL: repo, | ||
}) | ||
Expect(e).Should(BeNil()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these break since the check is no-longer FileBased
pdata *checker.DangerousWorkflowData, | ||
) error { | ||
ctx := context.TODO() | ||
cache := &containsCache{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the perspective of the cron, it would be nice for the cron if the cache persisted between calls. I'm curious how much overlap there would be.
Not sure what the best approach would be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for reference, running just the Dangerous-Workflow check on ossf/scorecard
consumes 93 core REST quota.
39 for urllib3/urllib3
73 for tensorflow/tensorflow
It's going to be very repo-dependent, based on their CI.
return sce.WithMessage(sce.ErrorCheckRuntime, fmt.Sprintf("unexpected repo reference: %s", s[0])) | ||
} | ||
repo := strings.Join(repoSplit[:2], "/") | ||
sha := s[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do a check here for if the SHA is actually a sha? Because this could be a tag too? which wouldn't need the imposter commit verification.
The branch protection check uses something like this to check for it:
// as a package level variable
var reCommitSHA = regexp.MustCompile("^[a-f0-9]{40}$")
...
// when testing
if !reCommitSHA.MatchString(foo) {
continue
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would also need to force sha
to lowercase, or account for it in the regex
for _, job := range workflow.Jobs { | ||
for _, step := range job.Steps { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't cover re-usable workflows ( i assume they are also vulnerable). Would it make sense to loop over Uses.Value
s from both jobs (job.WorkflowCall.Uses.Value
) and steps to get them into a slice? And then doing the same analysis over everything in the slice?
https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_iduses
I was trying to check how the code handled this job and it wasn't being checked:
scorecard/.github/workflows/goreleaser.yaml
Lines 69 to 75 in b6362b1
provenance: | |
needs: [goreleaser] | |
permissions: | |
actions: read # To read the workflow path. | |
id-token: write # To sign the provenance. | |
contents: write # To add assets to a release. | |
uses: slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v1.4.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reusable workflows can't be hash pinned. Would be interesting to check whether the imposter commit holds true for reusable workflows too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken from the docs:
{ref} can be a SHA, a release tag, or a branch name.
And an experiment I did last year for a different reason
https://github.com/spencerschrock/reusable-workflow-caller/blob/aa0ccd0b1d5255d79a7ba32fd729a1db93d2f124/.github/workflows/scorecard.yml#L30
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some high-level comments about code architecture:
- We could avoid changes to the
RepoClient
interface altogether and instead inject a new dependency in RunScorecard fn. - For the cron job, we can later optimize this injected dependency with a more API efficient model (like a blobstore-based cache maybe) while regular CLI continues to use REST API.
- Flag-guard this change so that it does not get rolled out to prod without some testing on our end.
supportedRequestTypes := []checker.RequestType{ | ||
checker.FileBased, | ||
checker.CommitBased, | ||
} | ||
if err := registerCheck(CheckDangerousWorkflow, DangerousWorkflow, supportedRequestTypes); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -30,6 +30,7 @@ const HeadSHA = "HEAD" | |||
// RepoClient interface is used by Scorecard checks to access a repo. | |||
type RepoClient interface { | |||
InitRepo(repo Repo, commitSHA string, commitDepth int) error | |||
NewClient(repo string, commitSHA string, commitDepth int) (RepoClient, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the usecase for NewClient
API. Could we not use CreateGitHubRepoClient
instead?
for _, job := range workflow.Jobs { | ||
for _, step := range job.Steps { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reusable workflows can't be hash pinned. Would be interesting to check whether the imposter commit holds true for reusable workflows too.
Stale pull request message |
What kind of change does this PR introduce?
Adds DangerousWorkflow check for imposter commits.
See https://www.chainguard.dev/unchained/what-the-fork-imposter-commits-in-github-actions-and-ci-cd
This borrows its implementation from
https://github.com/chainguard-dev/clank to look up imposter commits for
a repo.
(Is it a bug fix, feature, docs update, something else?)
What is the current behavior?
n/a (new check)
What is the new behavior (if this is a feature change)?**
Which issue(s) this PR fixes
Fixes #2733
Special notes for your reviewer
GitHub e2e test is failing because the Actions yaml is testing against isn't actually valid (it references non-existent repos). We should fix this, but figure get this PR out for review.
Does this PR introduce a user-facing change?
For user-facing changes, please add a concise, human-readable release note to
the
release-note
(In particular, describe what changes users might need to make in their
application as a result of this pull request.)